first cell of your notebook you must include your names, project title, and a hyperlink to your webpage at github.io; the webpage must be publicly readable on the internet (i.e, live) and must contain the same work that is in the submitted notebook.That is: the first cell of your notebook must be a markdown cell with a hyperlink to the generated webpage up at yourname.github.io

  • description of your project,
  • links to the data and other relevant resources,
  • a collaboration plan, and
  • the project goals.

Pitch¶

(3 Points) Slide: Your slide includes includes your name, link to your website, the datasets you hope to use, and a question you are currently considering.

(3 Points) Pitch: Your pitch takes no more than 2-3 mins, is coherent, and you are an active participant in class on the day of the pitch in person only.

Project Goals¶

The project primarily investigates the data related to health factors of each counties in USA. Health factors here include health behaviors, clinical care, socio-economic factors, physical enviornment and other health outcomes. Using available data along with additional public datasets, I plan to find the find possible discoveries regarding what variables are most responsible for health outcomes. I am sure there are metrics to measure like correlations to differentiate those. Using the variables, I plan to create a model and possibly test with new data sources.

Collaboration Plan¶

I plan to first find more datasets that I can relate this dataset to, and thus have more available dependent measures that could infulence the health outcomes. Maybe, the demographics, education quality, or presence or absence of certain institutions could add more light to the health results. Github will be primarily used to store all the data and notebooks.

Import libraries¶

In [3]:
pip install missingno 
Collecting missingno
  Downloading missingno-0.5.2-py3-none-any.whl (8.7 kB)
Requirement already satisfied: numpy in /opt/conda/lib/python3.11/site-packages (from missingno) (1.24.4)
Requirement already satisfied: matplotlib in /opt/conda/lib/python3.11/site-packages (from missingno) (3.7.2)
Requirement already satisfied: scipy in /opt/conda/lib/python3.11/site-packages (from missingno) (1.11.2)
Requirement already satisfied: seaborn in /opt/conda/lib/python3.11/site-packages (from missingno) (0.12.2)
Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->missingno) (1.1.0)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.11/site-packages (from matplotlib->missingno) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.11/site-packages (from matplotlib->missingno) (4.42.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->missingno) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.11/site-packages (from matplotlib->missingno) (23.1)
Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.11/site-packages (from matplotlib->missingno) (10.0.0)
Requirement already satisfied: pyparsing<3.1,>=2.3.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->missingno) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.11/site-packages (from matplotlib->missingno) (2.8.2)
Requirement already satisfied: pandas>=0.25 in /opt/conda/lib/python3.11/site-packages (from seaborn->missingno) (2.0.3)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.11/site-packages (from pandas>=0.25->seaborn->missingno) (2023.3)
Requirement already satisfied: tzdata>=2022.1 in /opt/conda/lib/python3.11/site-packages (from pandas>=0.25->seaborn->missingno) (2023.3)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.11/site-packages (from python-dateutil>=2.7->matplotlib->missingno) (1.16.0)
Installing collected packages: missingno
Successfully installed missingno-0.5.2
Note: you may need to restart the kernel to use updated packages.
In [4]:
import numpy as np 
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

# import pycountry_convert as pc 
import missingno as mno
import warnings

Read the dataset¶

In [5]:
URL = "https://www.countyhealthrankings.org/sites/default/files/media/document/analytic_data2023_0.csv"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0'}
with warnings.catch_warnings():
    warnings.simplefilter('ignore')
    df = pd.read_csv(URL, storage_options=headers);
In [7]:
# with warnings.catch_warnings():
#     warnings.simplefilter('ignore')
#     df = pd.read_csv("data/analytic_data2023_0.csv")
In [6]:
df.head()
Out[6]:
State FIPS Code County FIPS Code 5-digit FIPS Code State Abbreviation Name Release Year County Ranked (Yes=1/No=0) Premature Death raw value Premature Death numerator Premature Death denominator ... % Female raw value % Female numerator % Female denominator % Female CI low % Female CI high % Rural raw value % Rural numerator % Rural denominator % Rural CI low % Rural CI high
0 statecode countycode fipscode state county year county_ranked v001_rawvalue v001_numerator v001_denominator ... v057_rawvalue v057_numerator v057_denominator v057_cilow v057_cihigh v058_rawvalue v058_numerator v058_denominator v058_cilow v058_cihigh
1 00 000 00000 US United States 2023 NaN 7281.9355638 4125218 917267406 ... 0.5047067187 167509003 331893745 NaN NaN 0.193 NaN NaN NaN NaN
2 01 000 01000 AL Alabama 2023 NaN 10350.071456 88086 13668498 ... 0.5142542169 2591778 5039877 NaN NaN 0.409631829 1957932 4779736 NaN NaN
3 01 001 01001 AL Autauga County 2023 1 8027.3947267 836 156081 ... 0.513782892 30362 59095 NaN NaN 0.4200216232 22921 54571 NaN NaN
4 01 003 01003 AL Baldwin County 2023 1 8118.3582061 3377 614143 ... 0.5134771453 122872 239294 NaN NaN 0.4227909911 77060 182265 NaN NaN

5 rows × 720 columns

In [7]:
df.shape
Out[7]:
(3195, 720)
In [8]:
# Display all the columns
for col in df.columns:
    print(col)
State FIPS Code
County FIPS Code
5-digit FIPS Code
State Abbreviation
Name
Release Year
County Ranked (Yes=1/No=0)
Premature Death raw value
Premature Death numerator
Premature Death denominator
Premature Death CI low
Premature Death CI high
Premature Death flag (0 = No Flag/1=Unreliable/2=Suppressed)
Premature Death (AIAN)
Premature Death CI low (AIAN)
Premature Death CI high (AIAN)
Premature Death flag (AIAN) (. = No Flag/1=Unreliable/2=Suppressed)
Premature Death (Asian/Pacific Islander)
Premature Death CI low (Asian/Pacific Islander)
Premature Death CI high (Asian/Pacific Islander)
Premature Death flag (Asian/Pacific Islander) (. = No Flag/1=Unreliable/2=Suppressed)
Premature Death (Black)
Premature Death CI low (Black)
Premature Death CI high (Black)
Premature Death flag (Black) (. = No Flag/1=Unreliable/2=Suppressed)
Premature Death (Hispanic)
Premature Death CI low (Hispanic)
Premature Death CI high (Hispanic)
Premature Death flag (Hispanic) (. = No Flag/1=Unreliable/2=Suppressed)
Premature Death (White)
Premature Death CI low (White)
Premature Death CI high (White)
Premature Death flag (White) (. = No Flag/1=Unreliable/2=Suppressed)
Poor or Fair Health raw value
Poor or Fair Health numerator
Poor or Fair Health denominator
Poor or Fair Health CI low
Poor or Fair Health CI high
Poor Physical Health Days raw value
Poor Physical Health Days numerator
Poor Physical Health Days denominator
Poor Physical Health Days CI low
Poor Physical Health Days CI high
Poor Mental Health Days raw value
Poor Mental Health Days numerator
Poor Mental Health Days denominator
Poor Mental Health Days CI low
Poor Mental Health Days CI high
Low Birthweight raw value
Low Birthweight numerator
Low Birthweight denominator
Low Birthweight CI low
Low Birthweight CI high
LBW unreliable indicator (Unreliable = Numerator < 20 or relative standard error > 20%)
Low Birthweight (AIAN)
Low Birthweight CI low (AIAN)
Low Birthweight CI high (AIAN)
Low Birthweight (Asian/Pacific Islander)
Low Birthweight CI low (Asian/Pacific Islander)
Low Birthweight CI high (Asian/Pacific Islander)
Low Birthweight (Black)
Low Birthweight CI low (Black)
Low Birthweight CI high (Black)
Low Birthweight (Hispanic)
Low Birthweight CI low (Hispanic)
Low Birthweight CI high (Hispanic)
Low Birthweight (White)
Low Birthweight CI low (White)
Low Birthweight CI high (White)
Adult Smoking raw value
Adult Smoking numerator
Adult Smoking denominator
Adult Smoking CI low
Adult Smoking CI high
Adult Obesity raw value
Adult Obesity numerator
Adult Obesity denominator
Adult Obesity CI low
Adult Obesity CI high
Food Environment Index raw value
Food Environment Index numerator
Food Environment Index denominator
Food Environment Index CI low
Food Environment Index CI high
Physical Inactivity raw value
Physical Inactivity numerator
Physical Inactivity denominator
Physical Inactivity CI low
Physical Inactivity CI high
Access to Exercise Opportunities raw value
Access to Exercise Opportunities numerator
Access to Exercise Opportunities denominator
Access to Exercise Opportunities CI low
Access to Exercise Opportunities CI high
Excessive Drinking raw value
Excessive Drinking numerator
Excessive Drinking denominator
Excessive Drinking CI low
Excessive Drinking CI high
Alcohol-Impaired Driving Deaths raw value
Alcohol-Impaired Driving Deaths numerator
Alcohol-Impaired Driving Deaths denominator
Alcohol-Impaired Driving Deaths CI low
Alcohol-Impaired Driving Deaths CI high
Sexually Transmitted Infections raw value
Sexually Transmitted Infections numerator
Sexually Transmitted Infections denominator
Sexually Transmitted Infections CI low
Sexually Transmitted Infections CI high
Teen Births raw value
Teen Births numerator
Teen Births denominator
Teen Births CI low
Teen Births CI high
Teen Births (AIAN)
Teen Births CI low (AIAN)
Teen Births CI high (AIAN)
Teen Births (Asian/Pacific Islander)
Teen Births CI low (Asian/Pacific Islander)
Teen Births CI high (Asian/Pacific Islander)
Teen Births (Black)
Teen Births CI low (Black)
Teen Births CI high (Black)
Teen Births (Hispanic)
Teen Births CI low (Hispanic)
Teen Births CI high (Hispanic)
Teen Births (White)
Teen Births CI low (White)
Teen Births CI high (White)
Uninsured raw value
Uninsured numerator
Uninsured denominator
Uninsured CI low
Uninsured CI high
Primary Care Physicians raw value
Primary Care Physicians numerator
Primary Care Physicians denominator
Primary Care Physicians CI low
Primary Care Physicians CI high
Ratio of population to primary care physicians.
Dentists raw value
Dentists numerator
Dentists denominator
Dentists CI low
Dentists CI high
Ratio of population to dentists.
Mental Health Providers raw value
Mental Health Providers numerator
Mental Health Providers denominator
Mental Health Providers CI low
Mental Health Providers CI high
Ratio of population to mental health providers.
Preventable Hospital Stays raw value
Preventable Hospital Stays numerator
Preventable Hospital Stays denominator
Preventable Hospital Stays CI low
Preventable Hospital Stays CI high
Preventable Hospital Stays (AIAN)
Preventable Hospital Stays (Asian/Pacific Islander)
Preventable Hospital Stays (Black)
Preventable Hospital Stays (Hispanic)
Preventable Hospital Stays (White)
Mammography Screening raw value
Mammography Screening numerator
Mammography Screening denominator
Mammography Screening CI low
Mammography Screening CI high
Mammography Screening (AIAN)
Mammography Screening (Asian/Pacific Islander)
Mammography Screening (Black)
Mammography Screening (Hispanic)
Mammography Screening (White)
Flu Vaccinations raw value
Flu Vaccinations numerator
Flu Vaccinations denominator
Flu Vaccinations CI low
Flu Vaccinations CI high
Flu Vaccinations (AIAN)
Flu Vaccinations (Asian/Pacific Islander)
Flu Vaccinations (Black)
Flu Vaccinations (Hispanic)
Flu Vaccinations (White)
High School Completion raw value
High School Completion numerator
High School Completion denominator
High School Completion CI low
High School Completion CI high
Some College raw value
Some College numerator
Some College denominator
Some College CI low
Some College CI high
Unemployment raw value
Unemployment numerator
Unemployment denominator
Unemployment CI low
Unemployment CI high
Children in Poverty raw value
Children in Poverty numerator
Children in Poverty denominator
Children in Poverty CI low
Children in Poverty CI high
Children in Poverty (AIAN)
Children in Poverty CI low (AIAN)
Children in Poverty CI high (AIAN)
Children in Poverty (Asian/Pacific Islander)
Children in Poverty CI low (Asian/Pacific Islander)
Children in Poverty CI high (Asian/Pacific Islander)
Children in Poverty (Black)
Children in Poverty CI low (Black)
Children in Poverty CI high (Black)
Children in Poverty (Hispanic)
Children in Poverty CI low (Hispanic)
Children in Poverty CI high (Hispanic)
Children in Poverty (White)
Children in Poverty CI low (White)
Children in Poverty CI high (White)
Income Inequality raw value
Income Inequality numerator
Income Inequality denominator
Income Inequality CI low
Income Inequality CI high
Children in Single-Parent Households raw value
Children in Single-Parent Households numerator
Children in Single-Parent Households denominator
Children in Single-Parent Households CI low
Children in Single-Parent Households CI high
Social Associations raw value
Social Associations numerator
Social Associations denominator
Social Associations CI low
Social Associations CI high
Injury Deaths raw value
Injury Deaths numerator
Injury Deaths denominator
Injury Deaths CI low
Injury Deaths CI high
Injury Deaths (AIAN)
Injury Deaths CI low (AIAN)
Injury Deaths CI high (AIAN)
Injury Deaths (Asian/Pacific Islander)
Injury Deaths CI low (Asian/Pacific Islander)
Injury Deaths CI high (Asian/Pacific Islander)
Injury Deaths (Black)
Injury Deaths CI low (Black)
Injury Deaths CI high (Black)
Injury Deaths (Hispanic)
Injury Deaths CI low (Hispanic)
Injury Deaths CI high (Hispanic)
Injury Deaths (White)
Injury Deaths CI low (White)
Injury Deaths CI high (White)
Air Pollution - Particulate Matter raw value
Air Pollution - Particulate Matter numerator
Air Pollution - Particulate Matter denominator
Air Pollution - Particulate Matter CI low
Air Pollution - Particulate Matter CI high
Drinking Water Violations raw value
Drinking Water Violations numerator
Drinking Water Violations denominator
Drinking Water Violations CI low
Drinking Water Violations CI high
Severe Housing Problems raw value
Severe Housing Problems numerator
Severe Housing Problems denominator
Severe Housing Problems CI low
Severe Housing Problems CI high
Percentage of households with high housing costs
Percentage of households with high housing costs CI low
Percentage of households with high housing costs CI high
Percentage of households with overcrowding
Percentage of households with overcrowding CI low
Percentage of households with overcrowding CI high
Percentage of households with lack of kitchen or plumbing facilities
Percentage of households with lack of kitchen or plumbing facilities CI low
Percentage of households with lack of kitchen or plumbing facilities CI high
Driving Alone to Work raw value
Driving Alone to Work numerator
Driving Alone to Work denominator
Driving Alone to Work CI low
Driving Alone to Work CI high
Driving Alone to Work (AIAN)
Driving Alone to Work CI low (AIAN)
Driving Alone to Work CI high (AIAN)
Driving Alone to Work (Asian/Pacific Islander)
Driving Alone to Work CI low (Asian/Pacific Islander)
Driving Alone to Work CI high (Asian/Pacific Islander)
Driving Alone to Work (Black)
Driving Alone to Work CI low (Black)
Driving Alone to Work CI high (Black)
Driving Alone to Work (Hispanic)
Driving Alone to Work CI low (Hispanic)
Driving Alone to Work CI high (Hispanic)
Driving Alone to Work (White)
Driving Alone to Work CI low (White)
Driving Alone to Work CI high (White)
Long Commute - Driving Alone raw value
Long Commute - Driving Alone numerator
Long Commute - Driving Alone denominator
Long Commute - Driving Alone CI low
Long Commute - Driving Alone CI high
Life Expectancy raw value
Life Expectancy numerator
Life Expectancy denominator
Life Expectancy CI low
Life Expectancy CI high
Life Expectancy (AIAN)
Life Expectancy CI low (AIAN)
Life Expectancy CI high (AIAN)
Life Expectancy (Asian/Pacific Islander)
Life Expectancy CI low (Asian/Pacific Islander)
Life Expectancy CI high (Asian/Pacific Islander)
Life Expectancy (Black)
Life Expectancy CI low (Black)
Life Expectancy CI high (Black)
Life Expectancy (Hispanic)
Life Expectancy CI low (Hispanic)
Life Expectancy CI high (Hispanic)
Life Expectancy (White)
Life Expectancy CI low (White)
Life Expectancy CI high (White)
Premature Age-Adjusted Mortality raw value
Premature Age-Adjusted Mortality numerator
Premature Age-Adjusted Mortality denominator
Premature Age-Adjusted Mortality CI low
Premature Age-Adjusted Mortality CI high
Premature Age-Adjusted Mortality (AIAN)
Premature Age-Adjusted Mortality CI low (AIAN)
Premature Age-Adjusted Mortality CI high (AIAN)
Premature Age-Adjusted Mortality (Asian/Pacific Islander)
Premature Age-Adjusted Mortality CI low (Asian/Pacific Islander)
Premature Age-Adjusted Mortality CI high (Asian/Pacific Islander)
Premature Age-Adjusted Mortality (Black)
Premature Age-Adjusted Mortality CI low (Black)
Premature Age-Adjusted Mortality CI high (Black)
Premature Age-Adjusted Mortality (Hispanic)
Premature Age-Adjusted Mortality CI low (Hispanic)
Premature Age-Adjusted Mortality CI high (Hispanic)
Premature Age-Adjusted Mortality (White)
Premature Age-Adjusted Mortality CI low (White)
Premature Age-Adjusted Mortality CI high (White)
Child Mortality raw value
Child Mortality numerator
Child Mortality denominator
Child Mortality CI low
Child Mortality CI high
Child Mortality (AIAN)
Child Mortality CI low (AIAN)
Child Mortality CI high (AIAN)
Child Mortality (Asian/Pacific Islander)
Child Mortality CI low (Asian/Pacific Islander)
Child Mortality CI high (Asian/Pacific Islander)
Child Mortality (Black)
Child Mortality CI low (Black)
Child Mortality CI high (Black)
Child Mortality (Hispanic)
Child Mortality CI low (Hispanic)
Child Mortality CI high (Hispanic)
Child Mortality (White)
Child Mortality CI low (White)
Child Mortality CI high (White)
Infant Mortality raw value
Infant Mortality numerator
Infant Mortality denominator
Infant Mortality CI low
Infant Mortality CI high
Infant Mortality (AIAN)
Infant Mortality CI low (AIAN)
Infant Mortality CI high (AIAN)
Infant Mortality (Asian/Pacific Islander)
Infant Mortality CI low (Asian/Pacific Islander)
Infant Mortality CI high (Asian/Pacific Islander)
Infant Mortality (Black)
Infant Mortality CI low (Black)
Infant Mortality CI high (Black)
Infant Mortality (Hispanic)
Infant Mortality CI low (Hispanic)
Infant Mortality CI high (Hispanic)
Infant Mortality (White)
Infant Mortality CI low (White)
Infant Mortality CI high (White)
Frequent Physical Distress raw value
Frequent Physical Distress numerator
Frequent Physical Distress denominator
Frequent Physical Distress CI low
Frequent Physical Distress CI high
Frequent Mental Distress raw value
Frequent Mental Distress numerator
Frequent Mental Distress denominator
Frequent Mental Distress CI low
Frequent Mental Distress CI high
Diabetes Prevalence raw value
Diabetes Prevalence numerator
Diabetes Prevalence denominator
Diabetes Prevalence CI low
Diabetes Prevalence CI high
HIV Prevalence raw value
HIV Prevalence numerator
HIV Prevalence denominator
HIV Prevalence CI low
HIV Prevalence CI high
Food Insecurity raw value
Food Insecurity numerator
Food Insecurity denominator
Food Insecurity CI low
Food Insecurity CI high
Limited Access to Healthy Foods raw value
Limited Access to Healthy Foods numerator
Limited Access to Healthy Foods denominator
Limited Access to Healthy Foods CI low
Limited Access to Healthy Foods CI high
Drug Overdose Deaths raw value
Drug Overdose Deaths numerator
Drug Overdose Deaths denominator
Drug Overdose Deaths CI low
Drug Overdose Deaths CI high
Drug Overdose Deaths (AIAN)
Drug Overdose Deaths CI low (AIAN)
Drug Overdose Deaths CI high (AIAN)
Drug Overdose Deaths (Asian/Pacific Islander)
Drug Overdose Deaths CI low (Asian/Pacific Islander)
Drug Overdose Deaths CI high (Asian/Pacific Islander)
Drug Overdose Deaths (Black)
Drug Overdose Deaths CI low (Black)
Drug Overdose Deaths CI high (Black)
Drug Overdose Deaths (Hispanic)
Drug Overdose Deaths CI low (Hispanic)
Drug Overdose Deaths CI high (Hispanic)
Drug Overdose Deaths (White)
Drug Overdose Deaths CI low (White)
Drug Overdose Deaths CI high (White)
Insufficient Sleep raw value
Insufficient Sleep numerator
Insufficient Sleep denominator
Insufficient Sleep CI low
Insufficient Sleep CI high
Uninsured Adults raw value
Uninsured Adults numerator
Uninsured Adults denominator
Uninsured Adults CI low
Uninsured Adults CI high
Uninsured Children raw value
Uninsured Children numerator
Uninsured Children denominator
Uninsured Children CI low
Uninsured Children CI high
Other Primary Care Providers raw value
Other Primary Care Providers numerator
Other Primary Care Providers denominator
Other Primary Care Providers CI low
Other Primary Care Providers CI high
Ratio of population to primary care providers other than physicians.
High School Graduation raw value
High School Graduation numerator
High School Graduation denominator
High School Graduation CI low
High School Graduation CI high
Disconnected Youth raw value
Disconnected Youth numerator
Disconnected Youth denominator
Disconnected Youth CI low
Disconnected Youth CI high
Reading Scores raw value
Reading Scores numerator
Reading Scores denominator
Reading Scores CI low
Reading Scores CI high
Reading Scores (AIAN)
Reading Scores (Asian/Pacific Islander)
Reading Scores (Black)
Reading Scores (Hispanic)
Reading Scores (White)
Math Scores raw value
Math Scores numerator
Math Scores denominator
Math Scores CI low
Math Scores CI high
Math Scores (AIAN)
Math Scores (Asian/Pacific Islander)
Math Scores (Black)
Math Scores (Hispanic)
Math Scores (White)
School Segregation raw value
School Segregation numerator
School Segregation denominator
School Segregation CI low
School Segregation CI high
School Funding Adequacy raw value
School Funding Adequacy numerator
School Funding Adequacy denominator
School Funding Adequacy CI low
School Funding Adequacy CI high
Gender Pay Gap raw value
Gender Pay Gap numerator
Gender Pay Gap denominator
Gender Pay Gap CI low
Gender Pay Gap CI high
Median Household Income raw value
Median Household Income numerator
Median Household Income denominator
Median Household Income CI low
Median Household Income CI high
Median Household Income (AIAN)
Median Household Income CI low (AIAN)
Median Household Income CI high (AIAN)
Median household income (Asian)
Median household income CI low (Asian)
Median household income CI high (Asian)
Median Household Income (Black)
Median Household Income CI low (Black)
Median Household Income CI high (Black)
Median Household Income (Hispanic)
Median Household Income CI low (Hispanic)
Median Household Income CI high (Hispanic)
Median Household Income (White)
Median Household Income CI low (White)
Median Household Income CI high (White)
Living Wage raw value
Living Wage numerator
Living Wage denominator
Living Wage CI low
Living Wage CI high
Children Eligible for Free or Reduced Price Lunch raw value
Children Eligible for Free or Reduced Price Lunch numerator
Children Eligible for Free or Reduced Price Lunch denominator
Children Eligible for Free or Reduced Price Lunch CI low
Children Eligible for Free or Reduced Price Lunch CI high
Residential Segregation - Black/White raw value
Residential Segregation - Black/White numerator
Residential Segregation - Black/White denominator
Residential Segregation - Black/White CI low
Residential Segregation - Black/White CI high
Child Care Cost Burden raw value
Child Care Cost Burden numerator
Child Care Cost Burden denominator
Child Care Cost Burden CI low
Child Care Cost Burden CI high
Child Care Centers raw value
Child Care Centers numerator
Child Care Centers denominator
Child Care Centers CI low
Child Care Centers CI high
Homicides raw value
Homicides numerator
Homicides denominator
Homicides CI low
Homicides CI high
Homicides (AIAN)
Homicides CI low (AIAN)
Homicides CI high (AIAN)
Homicides (Asian/Pacific Islander)
Homicides CI low (Asian/Pacific Islander)
Homicides CI high (Asian/Pacific Islander)
Homicides (Black)
Homicides CI low (Black)
Homicides CI high (Black)
Homicides (Hispanic)
Homicides CI low (Hispanic)
Homicides CI high (Hispanic)
Homicides (White)
Homicides CI low (White)
Homicides CI high (White)
Suicides raw value
Suicides numerator
Suicides denominator
Suicides CI low
Suicides CI high
Crude suicide rate
Suicides (AIAN)
Suicides CI low (AIAN)
Suicides CI high (AIAN)
Suicides (Asian/Pacific Islander)
Suicides CI low (Asian/Pacific Islander)
Suicides CI high (Asian/Pacific Islander)
Suicides (Black)
Suicides CI low (Black)
Suicides CI high (Black)
Suicides (Hispanic)
Suicides CI low (Hispanic)
Suicides CI high (Hispanic)
Suicides (White)
Suicides CI low (White)
Suicides CI high (White)
Firearm Fatalities raw value
Firearm Fatalities numerator
Firearm Fatalities denominator
Firearm Fatalities CI low
Firearm Fatalities CI high
Firearm Fatalities (AIAN)
Firearm Fatalities CI low (AIAN)
Firearm Fatalities CI high (AIAN)
Firearm Fatalities (Asian/Pacific Islander)
Firearm Fatalities CI low (Asian/Pacific Islander)
Firearm Fatalities CI high (Asian/Pacific Islander)
Firearm Fatalities (Black)
Firearm Fatalities CI low (Black)
Firearm Fatalities CI high (Black)
Firearm Fatalities (Hispanic)
Firearm Fatalities CI low (Hispanic)
Firearm Fatalities CI high (Hispanic)
Firearm Fatalities (White)
Firearm Fatalities CI low (White)
Firearm Fatalities CI high (White)
Motor Vehicle Crash Deaths raw value
Motor Vehicle Crash Deaths numerator
Motor Vehicle Crash Deaths denominator
Motor Vehicle Crash Deaths CI low
Motor Vehicle Crash Deaths CI high
Motor Vehicle Crash Deaths (AIAN)
Motor Vehicle Crash Deaths CI low (AIAN)
Motor Vehicle Crash Deaths CI high (AIAN)
Motor Vehicle Crash Deaths (Asian/Pacific Islander)
Motor Vehicle Crash Deaths CI low (Asian/Pacific Islander)
Motor Vehicle Crash Deaths CI high (Asian/Pacific Islander)
Motor Vehicle Crash Deaths (Black)
Motor Vehicle Crash Deaths CI low (Black)
Motor Vehicle Crash Deaths CI high (Black)
Motor Vehicle Crash Deaths (Hispanic)
Motor Vehicle Crash Deaths CI low (Hispanic)
Motor Vehicle Crash Deaths CI high (Hispanic)
Motor Vehicle Crash Deaths (White)
Motor Vehicle Crash Deaths CI low (White)
Motor Vehicle Crash Deaths CI high (White)
Juvenile Arrests raw value
Juvenile Arrests numerator
Juvenile Arrests denominator
Juvenile Arrests CI low
Juvenile Arrests CI high
Number of juvenile delinquency cases formally processed by a juvenile court
Number of informally handled juvenile delinquency cases
Voter Turnout raw value
Voter Turnout numerator
Voter Turnout denominator
Voter Turnout CI low
Voter Turnout CI high
Census Participation raw value
Census Participation numerator
Census Participation denominator
Census Participation CI low
Census Participation CI high
Traffic Volume raw value
Traffic Volume numerator
Traffic Volume denominator
Traffic Volume CI low
Traffic Volume CI high
Homeownership raw value
Homeownership numerator
Homeownership denominator
Homeownership CI low
Homeownership CI high
Severe Housing Cost Burden raw value
Severe Housing Cost Burden numerator
Severe Housing Cost Burden denominator
Severe Housing Cost Burden CI low
Severe Housing Cost Burden CI high
Broadband Access raw value
Broadband Access numerator
Broadband Access denominator
Broadband Access CI low
Broadband Access CI high
Population raw value
Population numerator
Population denominator
Population CI low
Population CI high
% Below 18 Years of Age raw value
% Below 18 Years of Age numerator
% Below 18 Years of Age denominator
% Below 18 Years of Age CI low
% Below 18 Years of Age CI high
% 65 and Older raw value
% 65 and Older numerator
% 65 and Older denominator
% 65 and Older CI low
% 65 and Older CI high
% Non-Hispanic Black raw value
% Non-Hispanic Black numerator
% Non-Hispanic Black denominator
% Non-Hispanic Black CI low
% Non-Hispanic Black CI high
% American Indian or Alaska Native raw value
% American Indian or Alaska Native numerator
% American Indian or Alaska Native denominator
% American Indian or Alaska Native CI low
% American Indian or Alaska Native CI high
% Asian raw value
% Asian numerator
% Asian denominator
% Asian CI low
% Asian CI high
% Native Hawaiian or Other Pacific Islander raw value
% Native Hawaiian or Other Pacific Islander numerator
% Native Hawaiian or Other Pacific Islander denominator
% Native Hawaiian or Other Pacific Islander CI low
% Native Hawaiian or Other Pacific Islander CI high
% Hispanic raw value
% Hispanic numerator
% Hispanic denominator
% Hispanic CI low
% Hispanic CI high
% Non-Hispanic White raw value
% Non-Hispanic White numerator
% Non-Hispanic White denominator
% Non-Hispanic White CI low
% Non-Hispanic White CI high
% Not Proficient in English raw value
% Not Proficient in English numerator
% Not Proficient in English denominator
% Not Proficient in English CI low
% Not Proficient in English CI high
% Female raw value
% Female numerator
% Female denominator
% Female CI low
% Female CI high
% Rural raw value
% Rural numerator
% Rural denominator
% Rural CI low
% Rural CI high

About dataset¶

health_graph.png

This page describes about the idea behind the dataset. This link has all the datasets from different years to download. The dataset has 700+ features to work with, although there are similarities among multiple columns and missing data.

Primarily, the data columns can be divided in to health factors and health outcomes.

Data Cleaning¶

In [9]:
# This plot shows the missing data
# Longer the bar, lesser the missing data 
mno.bar(df)
Out[9]:
<Axes: >

Drop the columns where missing value is more than 1000¶

In [10]:
for col in df.columns:
    if df[col].isnull().sum()>1000:
        df.drop([col], axis=1, inplace=True)
In [11]:
# cols from 720 to 326
df.shape
Out[11]:
(3195, 326)
In [9]:
mno.bar(df)
Out[9]:
<Axes: >

Extract the necessary columns¶

A lot of columns give repitative meaning. So, we extract the ones that is enough to represent the particular measurement.

In [13]:
# We need the raw values only
new_cols = [x for x in df.columns if "raw value" in x]
new_cols = list(df.columns[0:5]) + new_cols
In [14]:
# Replace % by percent
cols = list(map(lambda x:x.replace("%", "percent"), new_cols))
# Remove certain char and substring 
cols = list(map(lambda x:x.replace("-", " "), cols))
cols = list(map(lambda x:x.replace(" raw value", ""), cols))
cols = list(map(lambda x:x.replace(" ", "_"), cols))
cols = list(map(lambda x:x.replace(" ", ""), cols))
cols
Out[14]:
['State_FIPS_Code',
 'County_FIPS_Code',
 '5_digit_FIPS_Code',
 'State_Abbreviation',
 'Name',
 'Premature_Death',
 'Poor_or_Fair_Health',
 'Poor_Physical_Health_Days',
 'Poor_Mental_Health_Days',
 'Low_Birthweight',
 'Adult_Smoking',
 'Adult_Obesity',
 'Food_Environment_Index',
 'Physical_Inactivity',
 'Access_to_Exercise_Opportunities',
 'Excessive_Drinking',
 'Alcohol_Impaired_Driving_Deaths',
 'Sexually_Transmitted_Infections',
 'Teen_Births',
 'Uninsured',
 'Primary_Care_Physicians',
 'Dentists',
 'Mental_Health_Providers',
 'Preventable_Hospital_Stays',
 'Mammography_Screening',
 'Flu_Vaccinations',
 'High_School_Completion',
 'Some_College',
 'Unemployment',
 'Children_in_Poverty',
 'Income_Inequality',
 'Children_in_Single_Parent_Households',
 'Social_Associations',
 'Injury_Deaths',
 'Air_Pollution___Particulate_Matter',
 'Drinking_Water_Violations',
 'Severe_Housing_Problems',
 'Driving_Alone_to_Work',
 'Long_Commute___Driving_Alone',
 'Life_Expectancy',
 'Premature_Age_Adjusted_Mortality',
 'Frequent_Physical_Distress',
 'Frequent_Mental_Distress',
 'Diabetes_Prevalence',
 'HIV_Prevalence',
 'Food_Insecurity',
 'Limited_Access_to_Healthy_Foods',
 'Insufficient_Sleep',
 'Uninsured_Adults',
 'Uninsured_Children',
 'Other_Primary_Care_Providers',
 'High_School_Graduation',
 'Reading_Scores',
 'Math_Scores',
 'School_Segregation',
 'School_Funding_Adequacy',
 'Gender_Pay_Gap',
 'Median_Household_Income',
 'Children_Eligible_for_Free_or_Reduced_Price_Lunch',
 'Child_Care_Cost_Burden',
 'Child_Care_Centers',
 'Suicides',
 'Firearm_Fatalities',
 'Motor_Vehicle_Crash_Deaths',
 'Voter_Turnout',
 'Census_Participation',
 'Traffic_Volume',
 'Homeownership',
 'Severe_Housing_Cost_Burden',
 'Broadband_Access',
 'Population',
 'percent_Below_18_Years_of_Age',
 'percent_65_and_Older',
 'percent_Non_Hispanic_Black',
 'percent_American_Indian_or_Alaska_Native',
 'percent_Asian',
 'percent_Native_Hawaiian_or_Other_Pacific_Islander',
 'percent_Hispanic',
 'percent_Non_Hispanic_White',
 'percent_Not_Proficient_in_English',
 'percent_Female',
 'percent_Rural']
In [15]:
# Slice the dataframe
df = df[new_cols]
# Rename the columns
df = df.rename(columns=dict(zip(new_cols, cols)))
In [16]:
df.head(2)
Out[16]:
State_FIPS_Code County_FIPS_Code 5_digit_FIPS_Code State_Abbreviation Name Premature_Death Poor_or_Fair_Health Poor_Physical_Health_Days Poor_Mental_Health_Days Low_Birthweight ... percent_65_and_Older percent_Non_Hispanic_Black percent_American_Indian_or_Alaska_Native percent_Asian percent_Native_Hawaiian_or_Other_Pacific_Islander percent_Hispanic percent_Non_Hispanic_White percent_Not_Proficient_in_English percent_Female percent_Rural
0 statecode countycode fipscode state county v001_rawvalue v002_rawvalue v036_rawvalue v042_rawvalue v037_rawvalue ... v053_rawvalue v054_rawvalue v055_rawvalue v081_rawvalue v080_rawvalue v056_rawvalue v126_rawvalue v059_rawvalue v057_rawvalue v058_rawvalue
1 00 000 00000 US United States 7281.9355638 0.12 3 4.4 0.0819065527 ... 0.1682705801 0.1261202919 0.0131594526 0.0613162595 0.0026003593 0.1887563262 0.5930615866 0.0410440385 0.5047067187 0.193

2 rows × 82 columns

In [17]:
# remove the first row
df = df.drop([0])
df = df.reset_index(drop=True)
df.head(2)
Out[17]:
State_FIPS_Code County_FIPS_Code 5_digit_FIPS_Code State_Abbreviation Name Premature_Death Poor_or_Fair_Health Poor_Physical_Health_Days Poor_Mental_Health_Days Low_Birthweight ... percent_65_and_Older percent_Non_Hispanic_Black percent_American_Indian_or_Alaska_Native percent_Asian percent_Native_Hawaiian_or_Other_Pacific_Islander percent_Hispanic percent_Non_Hispanic_White percent_Not_Proficient_in_English percent_Female percent_Rural
0 00 000 00000 US United States 7281.9355638 0.12 3 4.4 0.0819065527 ... 0.1682705801 0.1261202919 0.0131594526 0.0613162595 0.0026003593 0.1887563262 0.5930615866 0.0410440385 0.5047067187 0.193
1 01 000 01000 AL Alabama 10350.071456 0.189 3.4824161407 5.0732772786 0.1043276003 ... 0.1763568833 0.2651199623 0.0071444204 0.0155043466 0.0010883202 0.0478519615 0.6487709918 0.0102759588 0.5142542169 0.409631829

2 rows × 82 columns

In [18]:
# Checking the states
df["State_Abbreviation"].unique()
Out[18]:
array(['US', 'AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL',
       'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD',
       'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM',
       'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN',
       'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY'], dtype=object)
In [19]:
df[df["State_Abbreviation"] =="WY"].head(3)
Out[19]:
State_FIPS_Code County_FIPS_Code 5_digit_FIPS_Code State_Abbreviation Name Premature_Death Poor_or_Fair_Health Poor_Physical_Health_Days Poor_Mental_Health_Days Low_Birthweight ... percent_65_and_Older percent_Non_Hispanic_Black percent_American_Indian_or_Alaska_Native percent_Asian percent_Native_Hawaiian_or_Other_Pacific_Islander percent_Hispanic percent_Non_Hispanic_White percent_Not_Proficient_in_English percent_Female percent_Rural
3170 56 0 56000 WY Wyoming 7809.903503 0.115 2.698914 4.130766 0.090792 ... 0.179469 0.010394 0.028395 0.010935 0.001012 0.10554 0.833306 0.006424 0.48823 0.35242
3171 56 1 56001 WY Albany County 5133.53187 0.11 2.90064 4.179786 0.085394 ... 0.129866 0.012949 0.013162 0.034567 0.001409 0.101627 0.821581 0.006262 0.47817 0.119397
3172 56 3 56003 WY Big Horn County 9097.45733 0.123 2.998264 3.865339 0.069968 ... 0.217675 0.007479 0.018054 0.005416 0.000516 0.096114 0.867435 0.015205 0.491145 1.0

3 rows × 82 columns

The column where State_Abbreviation is US represent the country average and where State_Abbreviation is state name represent the state average.

County_FIPS_Code is 0 if county name is state itself.

Correct the data types¶

In [20]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3194 entries, 0 to 3193
Data columns (total 82 columns):
 #   Column                                             Non-Null Count  Dtype 
---  ------                                             --------------  ----- 
 0   State_FIPS_Code                                    3194 non-null   object
 1   County_FIPS_Code                                   3194 non-null   object
 2   5_digit_FIPS_Code                                  3194 non-null   object
 3   State_Abbreviation                                 3194 non-null   object
 4   Name                                               3194 non-null   object
 5   Premature_Death                                    3134 non-null   object
 6   Poor_or_Fair_Health                                3192 non-null   object
 7   Poor_Physical_Health_Days                          3192 non-null   object
 8   Poor_Mental_Health_Days                            3192 non-null   object
 9   Low_Birthweight                                    3088 non-null   object
 10  Adult_Smoking                                      3192 non-null   object
 11  Adult_Obesity                                      3192 non-null   object
 12  Food_Environment_Index                             3161 non-null   object
 13  Physical_Inactivity                                3192 non-null   object
 14  Access_to_Exercise_Opportunities                   3132 non-null   object
 15  Excessive_Drinking                                 3192 non-null   object
 16  Alcohol_Impaired_Driving_Deaths                    3167 non-null   object
 17  Sexually_Transmitted_Infections                    3071 non-null   object
 18  Teen_Births                                        3005 non-null   object
 19  Uninsured                                          3193 non-null   object
 20  Primary_Care_Physicians                            3047 non-null   object
 21  Dentists                                           3108 non-null   object
 22  Mental_Health_Providers                            2993 non-null   object
 23  Preventable_Hospital_Stays                         3123 non-null   object
 24  Mammography_Screening                              3173 non-null   object
 25  Flu_Vaccinations                                   3176 non-null   object
 26  High_School_Completion                             3194 non-null   object
 27  Some_College                                       3194 non-null   object
 28  Unemployment                                       3193 non-null   object
 29  Children_in_Poverty                                3193 non-null   object
 30  Income_Inequality                                  3187 non-null   object
 31  Children_in_Single_Parent_Households               3193 non-null   object
 32  Social_Associations                                3194 non-null   object
 33  Injury_Deaths                                      3089 non-null   object
 34  Air_Pollution___Particulate_Matter                 3167 non-null   object
 35  Drinking_Water_Violations                          3149 non-null   object
 36  Severe_Housing_Problems                            3194 non-null   object
 37  Driving_Alone_to_Work                              3194 non-null   object
 38  Long_Commute___Driving_Alone                       3194 non-null   object
 39  Life_Expectancy                                    3124 non-null   object
 40  Premature_Age_Adjusted_Mortality                   3134 non-null   object
 41  Frequent_Physical_Distress                         3192 non-null   object
 42  Frequent_Mental_Distress                           3192 non-null   object
 43  Diabetes_Prevalence                                3192 non-null   object
 44  HIV_Prevalence                                     2735 non-null   object
 45  Food_Insecurity                                    3194 non-null   object
 46  Limited_Access_to_Healthy_Foods                    3161 non-null   object
 47  Insufficient_Sleep                                 3192 non-null   object
 48  Uninsured_Adults                                   3193 non-null   object
 49  Uninsured_Children                                 3193 non-null   object
 50  Other_Primary_Care_Providers                       3183 non-null   object
 51  High_School_Graduation                             2362 non-null   object
 52  Reading_Scores                                     2826 non-null   object
 53  Math_Scores                                        2739 non-null   object
 54  School_Segregation                                 2962 non-null   object
 55  School_Funding_Adequacy                            3133 non-null   object
 56  Gender_Pay_Gap                                     3187 non-null   object
 57  Median_Household_Income                            3192 non-null   object
 58  Children_Eligible_for_Free_or_Reduced_Price_Lunch  2606 non-null   object
 59  Child_Care_Cost_Burden                             3192 non-null   object
 60  Child_Care_Centers                                 3044 non-null   object
 61  Suicides                                           2485 non-null   object
 62  Firearm_Fatalities                                 2323 non-null   object
 63  Motor_Vehicle_Crash_Deaths                         2743 non-null   object
 64  Voter_Turnout                                      3164 non-null   object
 65  Census_Participation                               3142 non-null   object
 66  Traffic_Volume                                     3041 non-null   object
 67  Homeownership                                      3194 non-null   object
 68  Severe_Housing_Cost_Burden                         3189 non-null   object
 69  Broadband_Access                                   3194 non-null   object
 70  Population                                         3194 non-null   object
 71  percent_Below_18_Years_of_Age                      3194 non-null   object
 72  percent_65_and_Older                               3194 non-null   object
 73  percent_Non_Hispanic_Black                         3194 non-null   object
 74  percent_American_Indian_or_Alaska_Native           3194 non-null   object
 75  percent_Asian                                      3194 non-null   object
 76  percent_Native_Hawaiian_or_Other_Pacific_Islander  3194 non-null   object
 77  percent_Hispanic                                   3194 non-null   object
 78  percent_Non_Hispanic_White                         3194 non-null   object
 79  percent_Not_Proficient_in_English                  3194 non-null   object
 80  percent_Female                                     3194 non-null   object
 81  percent_Rural                                      3187 non-null   object
dtypes: object(82)
memory usage: 2.0+ MB
In [22]:
print(df.head(2).T.to_string())
                                                               0             1
State_FIPS_Code                                               00            01
County_FIPS_Code                                             000           000
5_digit_FIPS_Code                                          00000         01000
State_Abbreviation                                            US            AL
Name                                               United States       Alabama
Premature_Death                                     7281.9355638  10350.071456
Poor_or_Fair_Health                                         0.12         0.189
Poor_Physical_Health_Days                                      3  3.4824161407
Poor_Mental_Health_Days                                      4.4  5.0732772786
Low_Birthweight                                     0.0819065527  0.1043276003
Adult_Smoking                                               0.16         0.195
Adult_Obesity                                               0.32         0.393
Food_Environment_Index                                         7           5.3
Physical_Inactivity                                         0.22         0.278
Access_to_Exercise_Opportunities                    0.8423863046  0.6092667226
Excessive_Drinking                                          0.19  0.1614162693
Alcohol_Impaired_Driving_Deaths                     0.2655507901   0.258869637
Sexually_Transmitted_Infections                            481.3         552.2
Teen_Births                                         19.300572586  27.598889304
Uninsured                                           0.1044496729  0.1182271569
Primary_Care_Physicians                             0.0007637606  0.0006579252
Dentists                                            0.0007246807  0.0004869166
Mental_Health_Providers                             0.0029570126  0.0012541973
Preventable_Hospital_Stays                                  2809          3599
Mammography_Screening                                       0.37          0.36
Flu_Vaccinations                                            0.51          0.44
High_School_Completion                              0.8887404032  0.8740270016
Some_College                                        0.6725325979  0.6150082742
Unemployment                                        0.0535291312  0.0343902829
Children_in_Poverty                                        0.169         0.227
Income_Inequality                                   4.8913749294  5.1766763312
Children_in_Single_Parent_Households                0.2512967212  0.3090921916
Social_Associations                                 9.1296963648  11.910925297
Injury_Deaths                                       75.899512272    86.9057184
Air_Pollution___Particulate_Matter                           7.4           9.3
Drinking_Water_Violations                                    NaN  0.1343283582
Severe_Housing_Problems                             0.1696721824  0.1315678879
Driving_Alone_to_Work                                0.732358592  0.8378249329
Long_Commute___Driving_Alone                               0.365          0.35
Life_Expectancy                                     78.528894654   74.83594896
Premature_Age_Adjusted_Mortality                     358.7460227  499.86855039
Frequent_Physical_Distress                                  0.09  0.1107739678
Frequent_Mental_Distress                                    0.14  0.1648429623
Diabetes_Prevalence                                         0.09          0.13
HIV_Prevalence                                             379.7         341.6
Food_Insecurity                                            0.118         0.145
Limited_Access_to_Healthy_Foods                     0.0610019647  0.0876054853
Insufficient_Sleep                                          0.33  0.3924300962
Uninsured_Adults                                     0.123766561  0.1491000099
Uninsured_Children                                  0.0539542665  0.0362680404
Other_Primary_Care_Providers                        0.0012318702  0.0010861376
High_School_Graduation                                      0.87  0.9071081634
Reading_Scores                                            3.0534   2.885602535
Math_Scores                                                3.003    2.72218766
School_Segregation                                        0.2454  0.2817412656
School_Funding_Adequacy                                     1062     -3868.511
Gender_Pay_Gap                                      0.8100444614  0.7418970988
Median_Household_Income                                    69717         53990
Children_Eligible_for_Free_or_Reduced_Price_Lunch   0.5308547682    0.53338294
Child_Care_Cost_Burden                              0.2659357065  0.2722218184
Child_Care_Centers                                  6.8638668282  5.5092316855
Suicides                                            13.818282988  16.200669652
Firearm_Fatalities                                  12.430330228  22.293899524
Motor_Vehicle_Crash_Deaths                          11.591311264  20.205514853
Voter_Turnout                                       0.6790952146  0.6263600041
Census_Participation                                       0.652           NaN
Traffic_Volume                                            505.31  213.69282656
Homeownership                                        0.646331101  0.6939478703
Severe_Housing_Cost_Burden                          0.1427574897  0.1194424811
Broadband_Access                                    0.8700069587  0.8204571454
Population                                             331893745       5039877
percent_Below_18_Years_of_Age                       0.2216565817  0.2226744819
percent_65_and_Older                                0.1682705801  0.1763568833
percent_Non_Hispanic_Black                          0.1261202919  0.2651199623
percent_American_Indian_or_Alaska_Native            0.0131594526  0.0071444204
percent_Asian                                       0.0613162595  0.0155043466
percent_Native_Hawaiian_or_Other_Pacific_Islander   0.0026003593  0.0010883202
percent_Hispanic                                    0.1887563262  0.0478519615
percent_Non_Hispanic_White                          0.5930615866  0.6487709918
percent_Not_Proficient_in_English                   0.0410440385  0.0102759588
percent_Female                                      0.5047067187  0.5142542169
percent_Rural                                              0.193   0.409631829

We can convert most of the columns into float type.

In [21]:
# Fill the NaN with np.nan
df.fillna(np.nan, inplace =True)
In [22]:
# list of cols to convert into float
to_float= [col for col in list(df.columns) if col not in list(df.columns[3:5])]
df[to_float] = df[to_float].apply(pd.to_numeric)
In [23]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3194 entries, 0 to 3193
Data columns (total 82 columns):
 #   Column                                             Non-Null Count  Dtype  
---  ------                                             --------------  -----  
 0   State_FIPS_Code                                    3194 non-null   int64  
 1   County_FIPS_Code                                   3194 non-null   int64  
 2   5_digit_FIPS_Code                                  3194 non-null   int64  
 3   State_Abbreviation                                 3194 non-null   object 
 4   Name                                               3194 non-null   object 
 5   Premature_Death                                    3134 non-null   float64
 6   Poor_or_Fair_Health                                3192 non-null   float64
 7   Poor_Physical_Health_Days                          3192 non-null   float64
 8   Poor_Mental_Health_Days                            3192 non-null   float64
 9   Low_Birthweight                                    3088 non-null   float64
 10  Adult_Smoking                                      3192 non-null   float64
 11  Adult_Obesity                                      3192 non-null   float64
 12  Food_Environment_Index                             3161 non-null   float64
 13  Physical_Inactivity                                3192 non-null   float64
 14  Access_to_Exercise_Opportunities                   3132 non-null   float64
 15  Excessive_Drinking                                 3192 non-null   float64
 16  Alcohol_Impaired_Driving_Deaths                    3167 non-null   float64
 17  Sexually_Transmitted_Infections                    3071 non-null   float64
 18  Teen_Births                                        3005 non-null   float64
 19  Uninsured                                          3193 non-null   float64
 20  Primary_Care_Physicians                            3047 non-null   float64
 21  Dentists                                           3108 non-null   float64
 22  Mental_Health_Providers                            2993 non-null   float64
 23  Preventable_Hospital_Stays                         3123 non-null   float64
 24  Mammography_Screening                              3173 non-null   float64
 25  Flu_Vaccinations                                   3176 non-null   float64
 26  High_School_Completion                             3194 non-null   float64
 27  Some_College                                       3194 non-null   float64
 28  Unemployment                                       3193 non-null   float64
 29  Children_in_Poverty                                3193 non-null   float64
 30  Income_Inequality                                  3187 non-null   float64
 31  Children_in_Single_Parent_Households               3193 non-null   float64
 32  Social_Associations                                3194 non-null   float64
 33  Injury_Deaths                                      3089 non-null   float64
 34  Air_Pollution___Particulate_Matter                 3167 non-null   float64
 35  Drinking_Water_Violations                          3149 non-null   float64
 36  Severe_Housing_Problems                            3194 non-null   float64
 37  Driving_Alone_to_Work                              3194 non-null   float64
 38  Long_Commute___Driving_Alone                       3194 non-null   float64
 39  Life_Expectancy                                    3124 non-null   float64
 40  Premature_Age_Adjusted_Mortality                   3134 non-null   float64
 41  Frequent_Physical_Distress                         3192 non-null   float64
 42  Frequent_Mental_Distress                           3192 non-null   float64
 43  Diabetes_Prevalence                                3192 non-null   float64
 44  HIV_Prevalence                                     2735 non-null   float64
 45  Food_Insecurity                                    3194 non-null   float64
 46  Limited_Access_to_Healthy_Foods                    3161 non-null   float64
 47  Insufficient_Sleep                                 3192 non-null   float64
 48  Uninsured_Adults                                   3193 non-null   float64
 49  Uninsured_Children                                 3193 non-null   float64
 50  Other_Primary_Care_Providers                       3183 non-null   float64
 51  High_School_Graduation                             2362 non-null   float64
 52  Reading_Scores                                     2826 non-null   float64
 53  Math_Scores                                        2739 non-null   float64
 54  School_Segregation                                 2962 non-null   float64
 55  School_Funding_Adequacy                            3133 non-null   float64
 56  Gender_Pay_Gap                                     3187 non-null   float64
 57  Median_Household_Income                            3192 non-null   float64
 58  Children_Eligible_for_Free_or_Reduced_Price_Lunch  2606 non-null   float64
 59  Child_Care_Cost_Burden                             3192 non-null   float64
 60  Child_Care_Centers                                 3044 non-null   float64
 61  Suicides                                           2485 non-null   float64
 62  Firearm_Fatalities                                 2323 non-null   float64
 63  Motor_Vehicle_Crash_Deaths                         2743 non-null   float64
 64  Voter_Turnout                                      3164 non-null   float64
 65  Census_Participation                               3142 non-null   float64
 66  Traffic_Volume                                     3041 non-null   float64
 67  Homeownership                                      3194 non-null   float64
 68  Severe_Housing_Cost_Burden                         3189 non-null   float64
 69  Broadband_Access                                   3194 non-null   float64
 70  Population                                         3194 non-null   int64  
 71  percent_Below_18_Years_of_Age                      3194 non-null   float64
 72  percent_65_and_Older                               3194 non-null   float64
 73  percent_Non_Hispanic_Black                         3194 non-null   float64
 74  percent_American_Indian_or_Alaska_Native           3194 non-null   float64
 75  percent_Asian                                      3194 non-null   float64
 76  percent_Native_Hawaiian_or_Other_Pacific_Islander  3194 non-null   float64
 77  percent_Hispanic                                   3194 non-null   float64
 78  percent_Non_Hispanic_White                         3194 non-null   float64
 79  percent_Not_Proficient_in_English                  3194 non-null   float64
 80  percent_Female                                     3194 non-null   float64
 81  percent_Rural                                      3187 non-null   float64
dtypes: float64(76), int64(4), object(2)
memory usage: 2.0+ MB
In [24]:
df.describe()
Out[24]:
State_FIPS_Code County_FIPS_Code 5_digit_FIPS_Code Premature_Death Poor_or_Fair_Health Poor_Physical_Health_Days Poor_Mental_Health_Days Low_Birthweight Adult_Smoking Adult_Obesity ... percent_65_and_Older percent_Non_Hispanic_Black percent_American_Indian_or_Alaska_Native percent_Asian percent_Native_Hawaiian_or_Other_Pacific_Islander percent_Hispanic percent_Non_Hispanic_White percent_Not_Proficient_in_English percent_Female percent_Rural
count 3194.000000 3194.000000 3194.000000 3134.000000 3192.000000 3192.000000 3192.000000 3088.000000 3192.000000 3192.000000 ... 3194.000000 3194.000000 3194.000000 3194.000000 3194.000000 3194.000000 3194.000000 3194.000000 3194.000000 3187.000000
mean 30.249530 101.886662 30351.417032 8891.562734 0.159942 3.511726 4.794971 0.082138 0.199762 0.361428 ... 0.199929 0.090869 0.024611 0.016971 0.001625 0.102183 0.749862 0.016072 0.495715 0.580467
std 15.160981 107.624838 15179.045587 2929.948857 0.044333 0.652486 0.628114 0.020293 0.041210 0.046825 ... 0.047879 0.141564 0.077649 0.030939 0.009667 0.139670 0.202763 0.026852 0.023189 0.315553
min 0.000000 0.000000 0.000000 3090.426825 0.065000 1.849017 2.779181 0.028871 0.067000 0.176000 ... 0.050729 0.000000 0.000000 0.000000 0.000000 0.006827 0.026802 0.000000 0.245614 0.000000
25% 18.000000 33.000000 18171.500000 6868.647904 0.125000 3.027309 4.373272 0.068281 0.174000 0.336000 ... 0.169189 0.008182 0.004311 0.005208 0.000377 0.026999 0.630136 0.002579 0.490583 0.325275
50% 29.000000 77.000000 29174.000000 8538.518058 0.152000 3.448386 4.813037 0.079532 0.198000 0.366000 ... 0.195519 0.024266 0.007193 0.008099 0.000721 0.048823 0.821402 0.007069 0.499580 0.588250
75% 45.000000 133.000000 45074.500000 10494.403953 0.189000 3.946273 5.221064 0.091418 0.226000 0.391000 ... 0.225286 0.104902 0.014716 0.015733 0.001357 0.108941 0.915241 0.017709 0.507127 0.861214
max 56.000000 840.000000 56045.000000 30007.870277 0.368000 6.335031 6.945581 0.216981 0.411000 0.532000 ... 0.581710 0.856197 0.922567 0.420553 0.475610 0.962604 0.975921 0.384369 0.570535 1.000000

8 rows × 80 columns

Plotting the data¶

Relationship between sleep and obesity in LA and CA

In [25]:
x = "Adult_Obesity"
y = "Insufficient_Sleep"
z = "State_Abbreviation"
not_null_mask = df[[x,y,z]].notnull().all(axis=1)
not_null_rows = df[[x,y,z]][not_null_mask]

not_null_rows = not_null_rows.query('State_Abbreviation== "LA" or State_Abbreviation== "CA"')
In [26]:
sns.scatterplot(data=not_null_rows, x = x, y = y, hue = z)
Out[26]:
<Axes: xlabel='Adult_Obesity', ylabel='Insufficient_Sleep'>

Checking correlations between few columns¶

In [27]:
sns.scatterplot(data=df, x = "Broadband_Access", y = "Math_Scores")
Out[27]:
<Axes: xlabel='Broadband_Access', ylabel='Math_Scores'>

Splitting the columns into health factors(variables) and healt outcomes types(target)

In [28]:
target_cols = ['Premature_Death', 'Life_Expectancy', 'Premature_Age_Adjusted_Mortality', 
'Poor_or_Fair_Health','Poor_Physical_Health_Days', 'Poor_Mental_Health_Days','Low_Birthweight', 
'Frequent_Physical_Distress','Frequent_Mental_Distress', 'Diabetes_Prevalence', 'HIV_Prevalence']
In [29]:
variable_cols = [x for x in df.columns[5:] if x not in target_cols]
In [30]:
df_corr = df.iloc[:,5:].corr()
df_corr.shape
Out[30]:
(77, 77)

Correlation measurements¶

In [31]:
df_corr = df_corr[variable_cols]
df_corr = df_corr.loc[target_cols]
df_corr.shape
Out[31]:
(11, 66)
In [36]:
sns.heatmap(df_corr.T, annot = True, annot_kws={"fontsize":7})
plt.xticks(fontsize=8)
plt.yticks(fontsize=9)
sns.set(rc={'figure.figsize':(10,15)})

Finding the features obesity is most correlated to

In [33]:
obesity_corr = list(df.iloc[:, 5:].corr()[["Adult_Obesity"]].sort_values(by = "Adult_Obesity").index)
obesity_corr = obesity_corr[:5] + obesity_corr[-7:-1]
obesity_corr
Out[33]:
['Life_Expectancy',
 'Median_Household_Income',
 'Some_College',
 'Voter_Turnout',
 'Broadband_Access',
 'Premature_Age_Adjusted_Mortality',
 'Frequent_Physical_Distress',
 'Adult_Smoking',
 'Poor_or_Fair_Health',
 'Diabetes_Prevalence',
 'Physical_Inactivity']

Few more plots

In [42]:
sns.scatterplot(data=df, x = "Median_Household_Income", y = "Adult_Obesity")
sns.set(rc={'figure.figsize':(6,6)})
In [43]:
sns.scatterplot(data=df, x = "Adult_Smoking", y = "Adult_Obesity")
sns.set(rc={'figure.figsize':(6,6)})

Extracting state average¶

In [44]:
state_df = df[ pd.to_numeric(df["County_FIPS_Code"]) == 0]

Plot the obesity rates among all the states including national average¶

In [46]:
sns.barplot(state_df.sort_values(by = ["Adult_Obesity"]), x="Adult_Obesity", y="State_Abbreviation")
plt.ylabel("State Names")
plt.xlabel("Adult obesity")
plt.yticks(fontsize=8)
plt.title("Obesity rates among adults in different US states", {'fontsize': 20} )
sns.set(rc={'figure.figsize':(10,9)})

Closing Thoughts and Final Goals¶

I plan to explore the datasets why some states or counties are good in health comes and why others are not. Other questions include, "what factors influence the health outcomes the most?","What affects the obesity most?", "Does the state/county location matter in health outcome?","why certain demograohic has a correlation with health results?" and so on.

Hopefully, I can find more data and variables to merge with this one, and with better data analysis, I could figure what variables to include in a model. Here, the model will be used to predict the health outcome such as mortality or obesity based on easily available dataset.